Search CORE

449 research outputs found

Deep Projective 3D Semantic Segmentation

Author: AE Johnson
JA Montoya-Zegarra
M Kazhdan
R Szeliski
S Gupta
S Salti
T Hackel
Publication venue
Publication date: 01/01/2017
Field of study

Semantic segmentation of 3D point clouds is a challenging problem with numerous real-world applications. While deep learning has revolutionized the field of image semantic segmentation, its impact on point cloud data has been limited so far. Recent attempts, based on 3D deep learning approaches (3D-CNNs), have achieved below-expected results. Such methods require voxelizations of the underlying point cloud data, leading to decreased spatial resolution and increased memory consumption. Additionally, 3D-CNNs greatly suffer from the limited availability of annotated datasets. In this paper, we propose an alternative framework that avoids the limitations of 3D-CNNs. Instead of directly solving the problem in 3D, we first project the point cloud onto a set of synthetic 2D-images. These images are then used as input to a 2D-CNN, designed for semantic segmentation. Finally, the obtained prediction scores are re-projected to the point cloud to obtain the segmentation results. We further investigate the impact of multiple modalities, such as color, depth and surface normals, in a multi-stream network architecture. Experiments are performed on the recent Semantic3D dataset. Our approach sets a new state-of-the-art by achieving a relative gain of 7.9 %, compared to the previous best approach.Comment: Submitted to CAIP 201

arXiv.org e-Print Archive

Publikationer från Linköpings universitet

Crossref

Digitala Vetenskapliga Arkivet - Academic Archive On-line

Feature-Guided Black-Box Safety Testing of Deep Neural Networks

Author: B Biggio
DG Lowe
G Katz
GMJB Chaslot
L Kocsis
L Pulina
R Szeliski
X Huang
Y LeCun
Publication venue
Publication date: 01/01/2018
Field of study

Despite the improved accuracy of deep neural networks, the discovery of adversarial examples has raised serious safety concerns. Most existing approaches for crafting adversarial examples necessitate some knowledge (architecture, parameters, etc.) of the network at hand. In this paper, we focus on image classifiers and propose a feature-guided black-box approach to test the safety of deep neural networks that requires no such knowledge. Our algorithm employs object detection techniques such as SIFT (Scale Invariant Feature Transform) to extract features from an image. These features are converted into a mutable saliency distribution, where high probability is assigned to pixels that affect the composition of the image with respect to the human visual system. We formulate the crafting of adversarial examples as a two-player turn-based stochastic game, where the first player's objective is to minimise the distance to an adversarial example by manipulating the features, and the second player can be cooperative, adversarial, or random. We show that, theoretically, the two-player game can con- verge to the optimal strategy, and that the optimal strategy represents a globally minimal adversarial image. For Lipschitz networks, we also identify conditions that provide safety guarantees that no adversarial examples exist. Using Monte Carlo tree search we gradually explore the game state space to search for adversarial examples. Our experiments show that, despite the black-box setting, manipulations guided by a perception-based saliency distribution are competitive with state-of-the-art methods that rely on white-box saliency matrices or sophisticated optimization procedures. Finally, we show how our method can be used to evaluate robustness of neural networks in safety-critical applications such as traffic sign recognition in self-driving cars.Comment: 35 pages, 5 tables, 23 figure

arXiv.org e-Print Archive

Crossref

Oxford University Research Archive

Generalized Multi-Camera Scene Reconstruction Using Graph Cuts

Author: A. Laurentini
D. Scharstein
K.N. Kutulakos
L. Ford
R. Cipolla
R. Szeliski
R. Szeliski
R.K. Ahuja
S. Barnard
S. Geman
S.M. Seitz
W.N. Martin
Y. Boykov
Y. Boykov
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2003
Field of study

Reconstructing a 3-D scene from more than one camera is a classical problem in computer vision. One of the major sources of difficulty is the fact that not all scene elements are visible from all cameras. In the last few years, two promising approaches have been developed [. . .] that formulate the scene reconstruction problem in terms of energy minimization, and minimize the energy using graph cuts. These energy minimization approaches treat the input images symmetrically, handle visibility constraints correctly, and allow spatial smoothness to be enforced. However, these algorithm propose different problem formulations, and handle a limited class of smoothness terms. One algorithm [. . .] uses a problem formulation that is restricted to two-camera stereo, and imposes smoothness between a pair of cameras. The other algorithm [. . .] can handle an arbitrary number of cameras, but imposes smoothness only with respect to a single camera. In this paper we give a more general energy minimization formulation for the problem, which allows a larger class of spatial smoothness constraints. We show that our formulation includes both of the previous approaches as special cases, as well as permitting new energy functions. Experimental results on real data with ground truth are also included.Engineering and Applied Science

CiteSeerX

Crossref

Harvard University - DASH

Using strong shape priors for stereo

Author: D. Scharstein
Press
R. Szeliski
V. Kolmogorov
V. Kolmogorov
Y. Boykov
Publication venue
Publication date: 01/01/2006
Field of study

Abstract. This paper addresses the problem of obtaining an accurate 3D reconstruction from multiple views. Taking inspiration from the recent successes of using strong prior knowledge for image segmentation, we propose a framework for 3D reconstruction which uses such priors to overcome the ambiguity inherent in this problem. Our framework is based on an object-specific Markov Random Field (MRF)[10]. It uses a volumetric scene representation and integrates conventional reconstruction measures such as photo-consistency, surface smoothness and visual hull membership with a strong object-specific prior. Simple parametric models of objects will be used as strong priors in our framework. We will show how parameters of these models can be efficiently estimated by performing inference on the MRF using dynamic graph cuts [7]. This procedure not only gives an accurate object reconstruction, but also provides us with information regarding the pose or state of the object being reconstructed. We will show the results of our method in reconstructing deformable and articulated objects.

CiteSeerX

Crossref

Video collections in panoramic contexts

Author: Benosman R.
Brooke J.
Farin D. S.
Hartley R.
Hermans C.
McCurdy N.
Szeliski R.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2013
Field of study

Crossref

MPG.PuRe

A high-quality video denoising algorithm based on reliable motion estimation

Author: A. Bruhn
A. Buades
B.A. Olshausen
B.K.P. Horn
C. Liu
D. Tschumperlé
J. Portilla
R. Szeliski
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2010
Field of study

11th European Conference on Computer Vision, Heraklion, Crete, Greece, September 5-11, 2010, Proceedings, Part IIIAlthough the recent advances in the sparse representations of images have achieved outstanding denosing results, removing real, structured noise in digital videos remains a challenging problem. We show the utility of reliable motion estimation to establish temporal correspondence across frames in order to achieve high-quality video denoising. In this paper, we propose an adaptive video denosing framework that integrates robust optical flow into a non-local means (NLM) framework with noise level estimation. The spatial regularization in optical flow is the key to ensure temporal coherence in removing structured noise. Furthermore, we introduce approximate K-nearest neighbor matching to significantly reduce the complexity of classical NLM methods. Experimental results show that our system is comparable with the state of the art in removing AWGN, and significantly outperforms the state of the art in removing real, structured noise

CiteSeerX

DSpace@MIT

Crossref

Casual 3D photography

Author: Alsisan S
Hedman P
Kopf J
Szeliski R
Publication venue
Publication date: 01/11/2017
Field of study

We present an algorithm that enables casual 3D photography. Given a set of input photos captured with a hand-held cell phone or DSLR camera, our algorithm reconstructs a 3D photo, a central panoramic, textured, normal mapped, multi-layered geometric mesh representation. 3D photos can be stored compactly and are optimized for being rendered from viewpoints that are near the capture viewpoints. They can be rendered using a standard rasterization pipeline to produce perspective views with motion parallax. When viewed in VR, 3D photos provide geometrically consistent views for both eyes. Our geometric representation also allows interacting with the scene using 3D geometry-aware effects, such as adding new objects to the scene and artistic lighting effects. Our 3D photo reconstruction algorithm starts with a standard structure from motion and multi-view stereo reconstruction of the scene. The dense stereo reconstruction is made robust to the imperfect capture conditions using a novel near envelope cost volume prior that discards erroneous near depth hypotheses. We propose a novel parallax-tolerant stitching algorithm that warps the depth maps into the central panorama and stitches two color-and-depth panoramas for the front and back scene surfaces. The two panoramas are fused into a single non-redundant, well-connected geometric mesh. We provide videos demonstrating users interactively viewing and manipulating our 3D photos

UCL Discovery

A Comparative Study of Energy Minimization Methods for Markov Random Fields with Smoothness-Based Priors

Author: A. Agarwala
C. Rother
D. Scharstein
M. Tappen
O. Veksler
R. Szeliski
R. Zabih
V. Kolmogorov
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date
Field of study

Crossref

Where is cognition? Towards an embodied, situated, and distributed interactionist theory of cognitive activity

Author: G.G. Slabaugh
K.N. Kutulakos
M. Li
R. Szeliski
R.C. Gonzalez
Publication venue: University of Canterbury. Psychology
Publication date: 01/01/2000
Field of study

In recent years researchers from a variety of cognitive science disciplines have begun to challenge some of the core assumptions of the dominant theoretical framework of cognitivism including the representation-computational view of cognition, the sense-model-plan-act understanding of cognitive architecture, and the use of a formal task description strategy for investigating the organisation of internal mental processes. Challenges to these assumptions are illustrated using empirical findings and theoretical arguments from the fields such as situated robotics, dynamical systems approaches to cognition, situated action and distributed cognition research, and sociohistorical studies of cognitive development. Several shared themes are extracted from the findings in these research programmes including: a focus on agent-environment systems as the primary unit of analysis; an attention to agent-environment interaction dynamics; a vision of the cognizer's internal mechanisms as essentially reactive and decentralised in nature; and a tendency for mutual definitions of agent, environment, and activity. It is argued that, taken together, these themes signal the emergence of a new approach to cognition called embodied, situated, and distributed interactionism. This interactionist alternative has many resonances with the dynamical systems approach to cognition. However, this approach does not provide a theory of the implementing substrate sufficient for an interactionist theoretical framework. It is suggested that such a theory can be found in a view of animals as autonomous systems coupled with a portrayal of the nervous system as a regulatory, coordinative, and integrative bodily subsystem. Although a number of recent simulations show connectionism's promise as a computational technique in simulating the role of the nervous system from an interactionist perspective, this embodied connectionist framework does not lend itself to understanding the advanced 'representation hungry' cognition we witness in much human behaviour. It is argued that this problem can be solved by understanding advanced cognition as the re-use of basic perception-action skills and structures that this feat is enabled by a general education within a social symbol-using environment

Crossref

UC Research Repository

Distortion Estimation Through Explicit Modeling of the Refractive Surface

Author: A Cefalu
C Bishop
DC Brown
F Devernay
J Park
M Pharr
MD Grossberg
P Sturm
R Hartley
R Szeliski
R Tsai
Satoshi Morinaka
SO Haykin
Z Zhang
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 24/09/2019
Field of study

Precise calibration is a must for high reliance 3D computer vision algorithms. A challenging case is when the camera is behind a protective glass or transparent object: due to refraction, the image is heavily distorted; the pinhole camera model alone can not be used and a distortion correction step is required. By directly modeling the geometry of the refractive media, we build the image generation process by tracing individual light rays from the camera to a target. Comparing the generated images to their distorted - observed - counterparts, we estimate the geometry parameters of the refractive surface via model inversion by employing an RBF neural network. We present an image collection methodology that produces data suited for finding the distortion parameters and test our algorithm on synthetic and real-world data. We analyze the results of the algorithm.Comment: Accepted to ICANN 201

arXiv.org e-Print Archive

Crossref